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ABSTRACT 

Stepwise  multiple  linear  regression  has  proved  to  be  an  extremely 
useful  computational  technique  in  data  analysis  problems.  This 
procedure  has  been  implemented  in  numerous  computer  programs  and  over¬ 
comes  the  acute  problem  that  often  exists  with  the  classical 
computational  methods  of  multiple  linear  regression.  This  problem 
manifests  itself  through  the  excessive  computation  time  involved  in 
obtaining  solutions  to  the  2^ -1  sets  of  normal  equations  that  arise 
when  seeking  an  optimum  linear  combination  of  variables  from  the  subsets 
of  the  N  variables.  The  procedure  takes  advantage  of  recurrence 
relations  existing  between  covariances  of  residuals,  regression 
coefficients,  and  inverse  elements  of  partitions  of  the  covariance 
matrix.  The  application  of  these  recurrence  formulas  is  equivalent  to 
the  introduction  or  deletion  of  a  variable  into  a  linear  approximating 
function  which  is  being  sought  as  the  solution  to  a  data  analysis 
problem.  This  report  contains  derivations  of  the  recurrence  formulas, 
shows  how  they  are  implemented  in  a  computer  program  and  includes  an 
improved  algorithm  which  halves  the  storage  requirements  of  previous 
algorithms.  A  computer  program  for  the  BRIE  SC  computer  which  incorpo¬ 
rates  this  procedure  is  described  by  the  author  and  others  in  a  previous  A 


report,  URL  Report  Ho.  1330,  July  1966.  The  present  report  is  an 
amplification  of  the  statistical  theory  and  computational  procedures 
presented  in  that  report  in  addition  to  the  exposition  of  the  improved 
algorithm. 
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I.  INTRODUCTION 

The  computational  technique  for  stepwise  multiple  linear 
regression  described  by  M.  A.  Efroymson  [5]*  has  proved  to  be 
extremely  useful  in  data  analysis  problems.  This  procedure,  with 
various  modifications,  has  been  implemented  in  numerous  computer 
programs  in  government  laboratories,  universities,  and  industry  and 
overcomes  one  of  the  major  problems  that  often  exists  with  the 
classical**  computational  methods  of  multiple  linear  regression.  In 
problems  where  many  variables  are  involved,  one  may  have  only 
intuitive  suspicion  regarding  those  variables  which  may  be  significant. 
In  these  instances,  one  of  the  classical  approaches  is  to  obtain  the 
least-squares  solution  to  the  regression  equation  containing  all  the 
variables  that  are  believed  to  be  potentially  significant  and  then 
attempt  to  eliminate  insignificant  variables  by  tests  of  significance. 
This  procedure  is  of  limited  use  when  many  variables  are  involved  and 
usually  runs  into  extreme  computational  difficulty.  An  alternative 
procedure  is  to  examine  the  solutions  of  all  the  subset  models  that  can 

*Numbers  in  brackets  denote  references  which  may  be  found  on  psge  k(r. 

**The  word  "classical”  here  may  be  a  misnomer  in  that  the  essential 
substance  of  the  computational  procedure  was  proposed  as  early  as 
1934  by  Horst  [12]  and  1958  by  Cochran  [4].  The  recent  interest  in 
the  subject  is  of  course  due  to  the  advent  of  modem  high  speed 
Computing  machinery. 


be  formed  from  the  collection  of  variables  that  are  of  interest  and 
choose  the  one  which  seems  to  give  the  "best  fit."  This  procedure, 
however,  can  be  very  costly  in  terms  of  computation  time.  If  one  has 
S  independent  variables  and  wishes  to  obtain  all  possible  solutions  to 
»sodel8  containing  1,2,...  and  N  variables  one  has  to  solve  2^-1  sets  of 
linear  equations.  For  candidate  models  containing  five  variables  this 
would  require  the  solution  of  51  sets  of  linear  equations  (a  practical 
maaber)  but  for  twenty  variables  this  number  jumps  to  1,01*8,575.  A 
means  to  circumvent  this  computational  difficulty  is  provided  by 
stepwise  multiple  regression.  This  procedure  takes  advantage  of  the 
fact  that  the  Gauss -Jordan  algorithm,  when  used  to  solve  the  normal 
equations  with  N  variables,  yields  intermediate  solutions  to  N 
regression  problems  containing  1,2,...  and  N  variables.  The  power  of 
the  procedure  lies  in  the  fact  that  the  variables  are  introduced  into 
the  regression  in  the  order  of  their  significance.  At  each  stage  the 
variable  which  is  entered  into  the  regression  is  the  one  which  will 
yield  the  greatest  reduction  in  the  sum  of  squares  of  residuals.  The 
power  of  the  procedure  is  further  enhanced  by  removing  terms  from 
regression  at  later  stages  that  have  become  insignificant  as  a  result 
of  the  inclusion  of  additional  variables  in  the  regression.  The 
computations  proceed  until  an  equilibrium  point  is  reached  where  no 
significant  reduction  in  the  sum  of  squares  of  residuals  is  to  be 
gained  by  adding  variables  in  the  regression  and  where  a  significant 
increase  in  the  sum  of  squares  of  residuals  would  arise  if  a  variable 
were  removed  from  regression.  The  procedure  described  above  will  be 
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referred  to  as  forvard  stepwise  regression.  A  modification  of  the  f 

i 

method  is  to  begin  with  all  variables  in  regression  and  then  remove 
insignificant  variables,  one  by  one.  In  a  fashion  similar  to  the 
forward  regression,  a  variable  which  is  removed  from  regression  can 
subsequently  reenter  if  it  becomes  significant  at  a  later  stage.  This 
procedure  will  be  referred  to  as  backwards  stepwise  regression. 

The  optimum  or  ideal  sub-model  chosen  from  a  candidate  model 
can  be  defined  as  that  model  containing  only  variables  which  are 
statistically  significant  at  a  chosen  level  of  significance  and  which 
has  the  minimum  variance  of  residuals  among  the  sub-models  that  have 
all  terms  significant  at  that  level. 

In  general,  neither  version  of  stepwise  regression  yields  the 
optimum  model  but  in  most  cases  the  model  obtained  by  either  procedure 
comes  very  close  to  being  optimum  and  in  many  cases  is  identical  to 
that  obtained  by  the  costly  method  of  enumerating  all  the  solutions. 

In  those  instances  where  one  is  interested  in  finding  the 
optimum  model,  as  defined  above,  the  Gauss -Jordan  algorithm  greatly 
reduces  the  required  computations.  The  optimum  path  of  elimination 
for  generating  all  possible  stepwise  combinations  can  be  controlled  by 
a  "binary  algorithm"  described  by  Lotto  [lUj,  1961,  and  G&rside  [6], 

1965*  The  procedure  is  optimized  so  that  the  computations  go  through 
the  fewest  recursions.  Despite  this  optimization,  the  computational 
labor  is  such  that  the  procedure  seems  limited  to  handling  fewer  than 
twenty  variables. 

I 
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The  paper  by  Efroymson  contains  mostly  a  description  of  the 
computational  procedure.  This  report  contains  derivations  of  the 
pertinent  mathematical  equations  related  to  the  procedure  including 
the  recurrence  formulas  relating  covariances  of  residuals,  regression 
coefficients,  and  elements  of  the  inverse  of  partitions  of  the 
covariance  matrix.  An  improvement  of  the  algorithm  used  by 
Efroymson  is  derived.  This  improved  algorithm  reduces  the  storage 
requirement  by  50$  thus  allowing  the  analysis  of  larger  models  or  the 
use  of  double  precision  arithmetic.  This  lacter  consideration  is 
quite  important  when  analysing  models  containing  many  variables.  In 
addition,  a  numerical  example  is  presented  showing  the  differing 
results  that  can  be  obtained  by  the  backward  and  forward  versions  of 
the  procedure. 

A  computer  program  for  BRLESC  (Ballistic  Research  Laboratories 
Electronic  Scientific  Computer)  which  incorporates  this  procedure  is 
described  by  the  author  and  others  in  a  previous  report,  BRL  Report 
No.  1330,  July  1966.  The  present  report  is  an  amplification  of  the 
statistical  theory  and  computational  procedures  presented  in  that 
report  in  addition  to  the  exposition  of  the  improved  algorithm. 
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II.  MULTIPLE  LINEAR  REGRE3SI0N 


The  theory  of  multiple  linear  regression  and.  correlation  is 
contained  in  the  theory  of  "Linear  Statistical  Models"  and  can  be 
found  in  many  widely  used  texts  such  as  that  by  Graybill  [7].  The 
concept  of  a  linear  model  is  fundamental  to  the  ensuing  exposition  and 
hence  the  definition  found  in  Graybill  is  listed.  By  a  linear  model 
is  meant  "an  equation  that  relates  random  variables,  mathematical 
variables,  and  parameters  and  that  is  linear  in  the  parameters  and  in 
the  random  variables."  Linear  models  are  classified  into  several 
categories  depending  on  the  distribution  of  the  variables,  the  presence 
and  nature  of  errors  when  observing  the  variables,  and  in  the  nature 
of  the  variables  themselves,  i.e.,  whether  the  variables  are 
mathematical  variables  or  random  variables.  The  equation  relating  the 
variables  is  written  in  *he  form 


Xn=bo+*lXl+b2X2  +  -  +bn-lXn-l* 


The  variables  X^,  Xg,  ...  Xn-1  are  referred  to  as  "independent 
variables"  and  Xfl  as  the  dependent  variable.  In  some  instances  one 
is  interested  in  polynomial  or  curvilinear  models  and  the  variables 
Xp  Xg,  . . .  are  not  necessarily  independent  in  the  probability 

sense.  For  example  the  model 


X2  =  ^1  X1  +  ^2  COS  X1  +  ^3  e 
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is  curvilinear,  i.e.,  linear  in  the  parameters  b^,  bg  and  b^  even 

though  nonlinear  in  X, .  This  mod.el  fits  into  the  framework  of 

X 

Equation  (l)  when  the  transformations  Xg  =  cos  and  =  e  are 
introduced.  This  model  is  contrasted  with  the  model 

b2 

X2  =  bl  6  Xi  +  bj  cos  bk  X±  (3) 

which  is  nonlinear  in  the  parameters  h^,  bg,  b^  and  b^  and  cannot  be 
linearized  by  transformations.  This  problem  is  one  of  nonlinear 
regression  and  is  not  discussed  further  in  this  report. 

In  multiple  linear  regression  one  is  interested  in  obtaining 
an  estimate  of  the  b^  which  will  yield  a  "prediction  equation" 
represented  by  Equation  (l)  which  best  fit3  a  set  of  observations. 

The  m  setc  of  observations  of  XQ,  the  dependent  variable,  and  of 

Xl*  X2'  Xn-1  can  be  written  a8  a  i  =  1,  2,  ...  m, 

j  =  1,  2,  . . .  n.  When  the  variables  are  measured  about  their 
respective  means.  Equation  (i)  can  be  written 

xn  -  xn  =  b1  ^  -  xx)  +  b2  (x2  -  x2)  +  ... 

+  b  -  (X  ,  -  X  ,).  (4) 

n-1  v  n-1  n-1 

The  coefficient  bQ  in  Equation  (l)  is  obtained  from  the  relationship 

n-1 

»o  -  \  ■  1  bi  V  <5> 

i=l 

Hereafter  the  variables  will  be  assumed  to  be  measured  about  their 
respective  means  and  the  quantity  X^^  will  be  used  to  represent  X^^  -  X^ 

12 


For  a  particular  observation  Equation  (4)  takes  the  form 


x.  =  b,  x..  +  b„  x +  ...  +  b  ,  x.  ,  +  e,.  (6) 

Jn  1  J1  2  J2  n-1  j,n-l  j 

is  a  residual  and  is  the  difference  between  the  predicted  value 
and  the  observed  value  of  X^*.  The  least-squares  method  of  estimating 
the  coefficients  b^  is  based  on  the  minimization  of  the  sum  of  the 

p 

squares  of  th>.  residuals,  denoted  as  E  . 


-I-4‘ 

J=1 


'I  (xjn-bl 
J-l 


Jn  ‘  bl  Xjl  "  b2  X,}2  ’  ’  Vl  X,),n-1^ 


This  minimization  is  achieved  by  taking  partial  derivatives  of  E  with 
respect  to  each  of  the  b^  and  equating  each  of  these  (n-l)  equations  to 
zero.  This  leads  to  the  normal  equations 


Ui 

I  V  <XJn  -  b. 


Xjl  ”  b2  XJ2 


~  bn-l  Xj,n-lJ  =  °* 

k  =  1,  2,  ...  n-l 


The  normal  equations  can  be  written  in  matrix  form 

X'X  B  =  X'Y.  (9) 

X  is  the  mx(n-l)  matrix  of  observations  of  the  independent  variables, 

X'  its  transpose,  Y  is  the  mxl  matrix  of  observations  of  the  dependent 


It  should  be  noted  that  the  variables  X. ,  i 
to  be  measured  without  -_rror.  1 


1,  2,  ...  n,  are  assumed 


variable  and  B  is  the  column  vector  of  (n-l)  regression  coefficients. 
The  solution  of  the  normal  equations  to  obtain  the  regression 
coefficients  is  given  as 

=  (X’X)"1  X'Y,  (10) 

\  I 

where  (X'X)  ^  is  the  inverse  of  the  matrix  X'X.  The  normal  equations 
can  be  solved  by  any  of  several  algorithms  for  the  solution  of  systems 
of  linear  equations,  however,  the  Gauss -Jordan  algorithm  is  used  in 
stepwise  multiple  regression  for  reasons  that  will  become  apparent. 


iii.  computation:  l  considerations  in  MunriPiz  linear  regression 


The  most  severe  computational  problem  occurring  in  multiple 
linear  regression  is  the  formation  and  solution  of  the  normal  equations. 
For  any  problem  containing  more  than  a  few  variables  and  observations 
this  problem  can  become  too  laborious  for  desk  calculation  and  the  use 
of  high  speed  computers  is  very  desirable.  As  a  consequence, 
generalized  library  programs  for  doing  multiple  regression  computations 
are  widely  available  and  can  be  obtained  in  most  computing  facilities. 

In  general  it  is  desirable  for  these  programs  to  do  more  than  compute 
regression  coefficients  and  variance  of  residuals,  they  should  also 
provide  associated  statistical  data  that  could  be  used  for  significance 
tests,  computing  prediction  intervals,  etc.  These  considerations  are 
discussed  by  Slater  [6],  1961  and  by  Healy  [ll],  I963.  These 
programs  should  be  designed  as  efficiently  as  possible  to  keep  the 
computation  time  reasonably  small.  Since  the  Gaus s -Jordan  algorithm 
provides  the  solution  to  (n-l)  regression  models  en  route  to  solving 
the  complete  problem  at  essentially  no  significant  increase  in  cost 
compared  to  other  algorithms,  it  seems  wherever  any  library  program  for 
multiple  regression  is  prepared,  the  program  should  incorporate  the 
stepwise  scheme.  Such  a  program  could  then  be  used  either  to  provide 
only  the  complete  solution  or  to  select  the  significant  variables  for 
inclusion  in  the  output  model. 
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The  programming  effort  required  to  include  the  optional 
capabilities  for  both  forward  stepwise  regression  and  backward  stepwise 
regression  is  relatively  small  compared  to  the  total  programming 
effort  required  to  prepare  either  program.  For  this  reason  it  seems 
worthwhile  that  a  well  designed  computer  program  should  provide  a 
capability  for  both  types  of  computations.  The  relative  advantages 
and  disadvantages  of  the  two  procedures  will  be  discussed  in  a  later 
section.  The  effort  required  to  prepare  the  matrix  elements  to  begin 
the  backward  stepwise  regression  is  identical  to  the  effort  required 
to  perform  a  complete  forward  regression.  Because  of  this  it  seems 
advisable  that  when  the  backward  option  is  selected,  the  program  should 
be  controlled  in  a  manner  which  yields  the  results  of  a  normal  forward 
regression  as  a  by-product.  When,  proceeding  forward  the  various 
solutions  obtained  may  correspond  to  models  of  the  form: 


Xn  -  K  *  »i  *1  +  *5  (U) 

Xn  *  bi'  +  »i'  *1  *  *3  +  *>7*7 


At  each  stage  the  program,  at  a  minimum,  should  print  the  standard 
deviation  of  residuals  and  identify  the  variables  entered  or  removed. 
This  information  can  then  prove  to  be  invaluable  if  one  chooses  a 
simpler  model  than  the  one  finally  selected  by  the  stepwise  regression 
procedure . 
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IV.  MATHEMATICAL  BASIS  OF  THE  STEPWISE  REGRESSION 


The  mathematical  basis  of  the  stepwise  regression  is  that  the 
transformation  rules  of  the  Gauss -Jordan  algorithm  correspond  to 
recurrence  relations  that  exist  between  covariances  of  residuals, 
regression  coefficients,  and  inverse  elements  of  partitions  of  the 
covariance  matrix.  These  relations  can  readily  be  derived  by  taking 
advantage  of  Yule's  notation  as  described  by  Kendall  [13].  In  this 
notation  the  regression  Equation  (l)  is  written  as  follows: 


Xn  ~  bnl.23...n-l  X1  +  bn2.13...n-l  X2  +  *** 


+  bn,n-1.12...n-2  Xn-1 


The  first  subscript  of  each  b  is  tliat  corresp  nding  to  the  dependent 
variable,  the  second  subscript  correspond?:  to  tne  ''ariable  attached  to 
the  regression  'oefficient.  These  two  subscripts  are  called  the 
primary  subscripts.  The  remaining  subscripts  on  the  right  of  the 
period  are  those  of  the  remaining  variables  and  are  called  secondary 
subscripts.  The  entire  collection  of  subscripts  for  those  variables 
that  are  in  regression  is  thus  represented  by  those  subscripts  to  the 
right  of  the  period  with  the  addition  of  the  subscript  t.o  the 
immediate  left  of  the  period.  It  should  be  noted  that  on  a  regression 
coefficient  neither  of  the  primary  subscripts  can  ever  be  included  in 
the  secondary  subscripts. 
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In  a  similar  notation  the  residuals  are  denoted  as 


Xn  22  (n-l)*  ®le  8U^8cript  to  the  left  of  the  period  is  that  of  the 

dependent  variable  and  those  to  the  right  are  the  subscripts  of  the 
independent  variables  in  the  regression.  Since  regressions  containing 
fever  than  the  (n-l)  independent  variables  will  be  of  interest  it  is 
necessary  to  introduce  the  following  notation.  The  subscript  q  will 
be  used  to  represent  the  collection  of  subscripts  1  through  (k-l)  with 
the  exclusion  of  i  and  j,  i.e., 

q  =  1,  8,  ...  (i-l)(i+l)  ...  (j-l(j+l)  ...  (k-l). 

Any  var:* able  can  be  considered  as  the  dependent  variable,  e.g., 
the  residuals  X.  and  X.  will  be  utilized  in  deriving  the  recurrence 
relations.  The  covariance  of  the  variables  X^  and  X^  is  defined  as 


where  f  is  the  degrees  of  freedom  and  the  summation  extends  over  the 
m  data  points.  For  the  present  f  will  be  defined  as  m  and  therefore 
does  not  vary  as  the  number  of  variables  in  regression  varies.  The 
covariance  of  residuals  is  defined  as 

s,  .  =  Y  X,  X,  /f 

ij.q  L  i.q  j.q' 

The  secondary  subscripts  of  a  covariance  indicate  the  variables  in  the 
regression.  When  using  this  notation  neither  of  the  primary  subscripts 


*Here after,  unless  denoted  otherwise,  all  summations  extend  over  the 
m  data  points. 
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can  be  included  in  the  secondary  subscripts.  The  collection  of 
variables  whose  subscripts  are  contained  in  q,  is  always  assumed  to  be 
in  regression,  however  additional  variables  such  as  and  (whose 
subscripts  are  not  contained  in  q)  may  also  be  in  regression.  For  a 
covariance  the  presence  of  this  situation  is  denoted  as  follows: 

skk.qij  =Z  ^.12...(k-l) 

Similar  notation  will  be  used  for  the  regression  coefficients  and  for 
elements  of  the  inverse  of  partitions  of  the  covariance  matrix. 

In  the  above  notation,  the  normal  equations  (for  the  entire 
collection  of  variables)  can  be  written  in  the  form 

IXn.l2...n-l  Xr  =  °»  r  =  1>  2»  •••»  11-1  ^ 

or  equivalently 

Slx  bnl.25...(n-l)  +  S2r  bn2.13. . .(n-l)  +  *** 

+  6(n-l)r  bn(n-l) . 12. . . (n-2)  =  snr'  r  “  lf  2*  ***  ^ 

The  complete  covariance  matrix  is: 


.his  matrix  corresponds  to  the  augmented  matrix  of  coefficients  usually 
considered  in  solving  a  system  of  linear  equations  with  the  addition 
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of  the  nth  row.  The  nth  row  ie  added  so  that  the  variance  of 

residuals,  s  will  he  made  available  through  the  recurrence  formulas, 
nn.q 

thus  avoiding  the  need  for  computing  residuals  at  each  stage. 

Derivation  of  Recurrence  Formulas 

In  deriving  the  recurrence  formulas  it  is  convenient  to  take 
note  of  Kendall's  [13]  three  observations: 

(a)  The  covariance  of  any  residual  and  any  variable  is  zero 
provided  that  the  subscript  of  the  variable  occurs  among  the  secondary 
subscripts  of  the  residual,  i.e.,  ^  X^^  X^  ^  =  0. 

(b)  The  covariance  of  any  two  residuals  is  zero  provided  that 

the  subscripts  of  either  residual  are  contained  in  the  secondary 

subscripts  of  the  other,  i.e.,  )  X,  X.  ,  =  0. 

Lt  i.q  J.qi 

(c)  The  covariance  of  any  two  residuals  is  unaltered  by 

omitting  any  or  all  terms  in  either  residual  whose  secondary 

subscripts  are  contained  in  the  secondary  subscripts  of  the  other 

residual,  i.e.,  Y  X.  X.  .  =  Y  X.  (X,  -  b..  X.). 

7  Lt  i.q  J.qi  L  i.q  v  J  Ji.q  i' 

Statement  (a)  Is  merely  a  statement  of  the  normal  equations,  (b)  and 
(c)  arise  as  a  consequence  of  (a). 

The  actual  value  of  a  recurrence  formula  in  computation  is 
dependent  upon  the  availability  of  all  the  elements  entering  in  the 
recurrence  except  the  one  to  be  determined.  With  this  in  mind  the 
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ensuing  recurrences  are  derived  and  their  relationship  to  the  Gauss- 
Jordan  algorithm  will  be  exhibited.  Furthermore  it  will  be  shown  that 
the  algorithm  can  be  used  without,  modification  in  a  backwards 
recursion,  i.e.,  once  a  term  is  in  regression  it  can  be  removed  by  the 
same  algorithm.  Altogether  18  recurrence  relations  are  of  interest. 

Nine  of  these  correspond  to  the  introduction  of  variables  in  regression 
and  the  remaining  nine  correspond  to  the  removal  of  variables  from  the 
regression.  It  will  be  shown  that  these  18  recurrence  formulas  are 
equivalent  to  the  four  rules  of  the  Gauss -Jordan  algorithm.  The 
elements  of  the  derivations  do  not  necessitate  any  particular  sequencing 
of  the  digits  in  q  (the  sequence  has  been  assumed  for  simplicity)  and 
hold  true  for  arbitrary  i,  j  and  k.  The  presence  of  X^  X,  and  X^  in 
regression  (or  not)  will  be  denoted  by  the  notation  introduced 
previously. 

From  (c) 

5X.,3W  =  0=IX*.q  (VlJM  v- 

l\.  q  XJ  -I\.q  XJ.q  ^l\.q  \  ‘  l \.q  \.q 

Hence 

^  *k.q  XJ-q  ~  ^.Jk.q  H  *k.q* 

Dividing  by  f 

^Jk.q  “  skj.q/skk.q  ~  8jk.q^skk.q* 
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Equivalently 


Hence 


Sik.q  Ekj.q  Sii.q  ’  Ski.q  Sij.( 


ji.q  sii.q  skk.q  Sik.q  Skk.q 


bji.qk  "  bji.q  "  bki.q  Skj.qi^Skk.qi’ 


bij.qk  “  “  bij.q  ~  ^”bkj.q^  Sik.qj/Skk.qj' 


dij.qk  ~  dij.q  “  Sik.qj  'Vj.q^kk.qj* 


Elements  of  the  Inverse  Matrix 


Consider  the  partition  of  the  covariance  matrix  formed  by 
taking  all  the  rows  and  columns  of  indices  q,  i,  j,  k.  Denote  the 
determinant  of  this  matrix  as  R  and  the  cofactor  of  the  element  as 

R ij.  Since  the  covariance  matrix  is  symmetrical,  R^  =  R^.  From 
Craemer1 s  rule 


bij.qk  =  "  Ri/Rii* 


'ii.qjk  ~  Z  Xi.qjk^f  =  Z  Xi.qjk  Xi/f 


3ii  "  Z  bit.l2...(i-l)(i+l)...(t-l)(t+l), 

t=q,j,k 


..k  Sit 


sii  + 


Ai  l 


sit  Rit 


t=q,j,k 


~  Z  Sit  Rit/Rii* 


t=q,i,j,k 


Prom  the  Laplace  expansion  theorem 


R  =  I  8it  Rit* 

t'sq,!,  J,k 


Hence 


Prom  Equation  (l6) 


Sii.qjk 


(22) 


8i1.qk  =  bji.qk  Sii.qk 


Kl-  is  the  cof actor  of  the  second  order  minor  in  R  which  is  obtained 
by  striking  out  row  h  and  column  i  and  then  row  j  and  column  k. 


=  -  R  ./R 


ij.qk  Ji'  ii-Jj  ij'  ii-Jj 


RiVRli-1 


(23) 


The  i,Jth  element  of  the  inverse  of  the  partition  of  the 

covariance  matrix  defined  above  is  denoted  as  c.  .  ....  The  only 

Ij.qijk 

inverse  elements  which  will  be  of  interest  are  those  elements  which  are 
inverse  elements  of  partitions  defined  by  taking  the  rows  and  columns 
subscripted  by  the  subscripts  of  the  variables  in  regression.  Hence 
the  primary  subscripts  of  the  inverse  elements  will  always  be  included 
in  the  secondary  subscripts.  As  in  the  case  of  covariances,  the 
secondary  subscript#  will  denote  the  variables  in  regression.  From 
fundamentals  of  matrix  algebra 


*This  notation  is  taken  from  Gutman  [8]. 


2k 


The  formulas  derived  to  this  point  are  those  for  forward 
recursion,  or  for  the  addition  of  variables  into  t he  regression. 
Similar  formulas  are  now  derived  for  backward  recursion. 


Prom  Equation  (25) 


bki.qJ  =  "  cik.qijk  8kk.qiJ  =  ’  “ik.qijk^kk.qijk*  ^ 


Similarly 


‘Vj.qi  “  Ckj.qijk/Ckk.qijk* 


(50) 


From  Equation  (26) 


CiJ.qiJ  =  °ij.qijk  +  bki.qJ  Sj.qi^kk.qij* 


Substituting  for 


\i.qj 
^J.qi 
ciJ-qij  = 

From  Equation  (l8) 

8iJ-q  =  8 

=  s 

or  s  *  s 

ij*q 


'  cik.qijk  8kk.qiJ  nA 
c  Jk . qi Jk^ckk . q  i  Jk* 

CiJ.qiJk  "  °ik.qijk  cJk.qiJk/ckk.qiJk*  (31 ) 


ij.qk  +  8ik.q 
ij.qk  +  bik.q 
ij.qk  "  dik.q 

8kk.qiJ 


8kJ.q^8kk.q 

8kk.q  bJk.q  8kk.q^skk.q 


bjk.q/ckk.qk’ 

(52) 

1/<5kk.qijk* 

(55) 

From  Equation  (27) 


From  Equation  (19) 


bJi.q  “  bji.qk  +  bki.q  Skj.qi/8kk.qi 

bji.qk  "  Cik.qik  Skk.qi  bJk.qi/Gkk.qik  Skk.qi 

V\  —  V\  / 

~  ji.^k  “  cik.qik  bjk.qi' ckk.qik  (5*0 

oimilarly 

bij‘q  biJ.qk  "  C Jk.qjk^ ~bik.q  j  ^ckk.q,)k 

diJ-q  "  diJ-qk  “  dik.qJ  C jk.qjk^kk.qjk*  (55) 

Prom  Equation  (16) 

\j-q  ~  bJk.q  skk.q  =  bJk.q/ckk.qk' 

Similarly 

sik.q  "  bik.q/ckk.qk  =  "  dik.q^ckk.qk*  (57) 

The  eighteen  recurrence  formulas  are  listed  in  a  convenient  order  on 
the  following  page.  The  successive  application  of  these  formulas  to 
appropriate  matrix  elements  is  the  basis  of  stepwise  multiple  linear 
regression.  The  matrix  elements  are  continually  replaced  at  each 
stage  by  the  matrix  elements  of  the  new  stage.  The  initial  matrix  is 
the  covariance  matrix,  equation  (15).  Each  stage  is  characterized  by 
the  presence  of  a  particular  set  of  independent  variables  in  the 
regression.  In  practice  the  variables  will  not  enter  the  regression 
in  sequence,  but  in  an  order  determined  by  their  ability  to  reduce  the 
variance  of  residuals.  For  the  present  we  can  assume  that  as  the 
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List  of  Recurrence  Formulas 


1. 

ciJ.qiJk 

= 

CiJ.qij  “  bki.qj  dkJ.qi/6kk.qiJ 

2. 

Cik.qijk 

s 

“  bki,q/8kk.qiJ 

3- 

bji.qk 

s 

b,}i.q  *  bki.q  8kj.qi/8kk.qi 

4. 

CkJ.qiJk 

= 

^kj.qi^kk.qlj 

5. 

ckk.qijk 

= 

1/®kk.qiJ 

6. 

bJk.q 

* 

8kJ.q/8kk.q 

7. 

diJ.qk 

s 

diJ.q  “  dk^.q  Sik.qj^Skk.qj 

8. 

4ik.q 

* 

8lk.q^8kk.q 

9. 

®ij.qk 

* 

8iJ.q  "  8ik.q  8kj.q/8kk.q 

10. 

CiJ.qlJ 

= 

Cij.qljk  *  Cik.qijk  Cjk.qiJk^Ckk. 

11. 

bki.qJ 

= 

"  cki.q,/ckk.qijk 

12. 

bJi.q 

= 

bjl.qk  ”  Cik.qi  ^Jk.qi^kk.qik 

13. 

^j.ql 

= 

ckJ.qiJl/ckk.qiJk 

14. 

skk.qij 

= 

^kk.qijk 

15. 

8kJ.q 

= 

bJk.q/ckk.qk 

16. 

diJ.q 

8 

diJ.q  ’  dik.qJ  Cjk.qjk/Ckk.qjk 

17- 

8ik.q 

= 

dki.q/ckk.qk 

18. 

8U.q 

= 

Sij.qk  ‘  dik.q  b,jk.q/Ckk.qk 
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variables  enter  the  regression  they  are  reordered.  The  end  effect 
(after  the  reordering)  is  that  the  variables  are  introduced  into  the 


regression  in  the  order  X^,  Xg,  ...  X^,  hence,  the  k’th  stage  is 
characterized  by  the  presence  of  X^  Xg,  ...  in  regression. 

Theorem  on  Stepwise  Multiple  Linear  Regression 

Consider  the  sequence  of  matrices  A, A.,  ...A  ,.  A  is  the 

o  T.  n-1  o 

covariance  matrix,  Equation  (15).  A^(k  =  2 ,  ...  n-l)  is  the  matrix 

formed  by  applying  the  transformation 


a*  =  a*:1  -  a*'1  aj*"1/^1,  i  =  1,  2,  •••>  (k-l)(k+l)  n 
iJ  ik  Tti  "kk  j  =  lf  2f  f  (k-i)(K+l)  ...,  n 


aik  =  '  aik 


k-1  /  k-1 


i  =  1,  2,  ...,  (k-l)(k+l)  ...,  n  (38) 


k  _  k-1 /  k-1 
~  ®k,)  '  ®kk 

k  n  /  k-1 
®kk  1//akk 


j  =1,  2,  ...,  (k-l)(k+l)  ...,  n 


i  =  A  =  k 


to  the  matrix  Aj^.  a*j  is  the  i,  Jth  element  of  the  matrix  A^. 
Denote  this  transformation  as  T^.  The  results  of  applying  this 
transformation  axe  contained  in  the  following  theorem: 


THEOREM: 


The  matrix  A^  contains  four  partitions,  the  respective 
partitions  having  elements  as  follows: 


aij  =  cij.l2...k' 


i  =  lj  2|  k ^  J  —  1|  2^  •••  k 


ai^  =  bji.l2...i-l,i+l...k'  1  =  lf  2’  k,  «j  *  k+1>  k+2  •••  n  (59) 
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ai,j  “  dy.l2...i-lii+l...k'  1  =  k+1'  k+2>  •••  n>  <5  =  x>  2>  •••  k 


aij  =  8ij  12... k> 


i  =  k+1,  k+2,  ...  n>  ,3  =  k+1,  k+2,  . . .  n 


Vl> 

that 


The  proof  is  by  induction.  Assume  that  the  theorem  holds  for 
then  show  that  it  necessarily  must  hold  for  and  furthermore 
it  holds  for  k  =  1.  The  matrix  can  be  partitioned  as  follows: 


1  *k-l,l 

Vl,2 

Vl,5  ’ 

Vl- 

Vl,U 

i 

Vi,  5 

Vl,6 

\  W 

Vl,8 

Vl,9  , 

(*0) 


cn 

C12 

*  *  *  ci,k-l 

Si 

bk+l,l  * 

..  b  , 
nx 

C21 

C22 

*  *  *  c2,k-l 

bk2 

bk+l,2  * 

**  bn2 

°k-l.l 

Ck-1.2 

**•  ck-l.k-l 

bk.k-l 

bk+l.k-l 

*•*  bn.k-l 

V 

V 

“*  Vk-l 

8kk 

sk,k+l 

•  •  •  s, 
kn 

Vi,i 

Vl,2 

•**  Vl,k-1 

8k+l,k 

Sk+l,k+l 

•  •  •  S.  .• 

k+l,n 

d  1 
nl 

dn2 

•*•  dn,k-l 

8nk 

sn,k+l 

•  •  •  6 

nn 

The  secondary  subscripts  of  the  matrix  have  been  omitted  in 
for  brevity.  The  variables  having  subscripts  1,  2,  ...  k-1  are 
assumed  to  be  in  regression  (due  to  the  assumption  that  the  theorem 
and  hence  the  appropriate  secondary  subscripts  should 
be  assumed  to  be  attached  to  the  various  elements. 


holds  for 
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By  inspection  of  the  transformation  in  relation  to  the 
elements  stored  in  the  nine  partitions  on  which  the  transformation 
acts,  it  is  seen  that  the  application  of  T^  is  identical  to  the 
application  of  the  nine  recurrence  formulas  1  through  9*  Furthermore 
the  application  of  the  nine  recurrence  formulas  to  ^  is  equivalent 
to  replacing  A^_1  with  A^.  The  same  holds  true  for  k  =  1  and  hence  the 
proof  is  complete. 


In  a  similar  fashion  it  can  be  shown  that  as  a  consequence  of 
the  nine  recurrence  formulas  for  backwards  recursion,  i.e.,  10  through 
l8,  the  application  of  T^  to 


The  consequence  of  the  above  theorem  can  be  generalized  as 
follows:  The  collection  of  variables  whose  subscripts  are  represented 
by  the  values  taken  by  k  in  the  successive  application  of  T^  are  said 
to  be  in  regression  if  k  appears  an  odd  number  of  times  in  the 
collection.  Alternatively,  a  variable  is  said  not  to  be  in  regression 
if  its  subscript  does  not  appear  in  the  collection,  or  if  it  appears 
an  even  number  of  times.  The  content  of  the  matrix  at  any  stage  is  as 
follows: 


aij  =  siJ*"  wlien  nor  xj  are  in  regression. 


aij  =  kji*”  w*len  Xi  *s  regression  but  not  X^. 


a^j  =  d^j.-  when  is  in  regression  but  not  X^. 


aij  =  Cij*~  wben  both  and  are  in  regression. 


The  secondary  subscripts  are  those  appropriate  to  the  particular 
variables  in  the  regression  at  that  stage.  A  bookkeeping  method  for 
determining  which  variables  are  in  regression  will  be  described  in 
Section  VI. 


The  Correlation  Matrix 


For  computational  reasons  it  is  desirable  to  transform  the 
initial  matrix  Aq  (the  covariance  matrix)  by  dividing  each  element 
aij  si  6j  w*iere  =  ’^ie  resulting  matrix  is  a  matrix  of 
simple  correlation  coefficients  r^,  i,  j  =1,  2,  ...  n  where 


‘13 


*  fiA 


The  diagonal  elements  of  Aq  are  then  unity  and  the  remaining  elements 
are  of  a  more  uniform  order  of  magnitude.  The  recurrence  formulas 
remain  valid  as  shown  below: 


Consider  the  regression  equation 

V8n  *  Bl(Xl/sl)  +  B2(X2/s2)  +  •••  +  WSk}- 

By  inspection  it  is  seen  that  the  covariance  matrix  for  this  system  is 

equal  to  the  correlation  matrix  defined  above.  The  coefficients  are 

those  that  arise  when  A„  is  the  correlation  matrix.  Hence  the 

o 

coefficient  b  .  is  computed  from  the  formula 
m.q 

b  .  _  =  3  ,  s/s. . 
nl.q  ni.q  n'  i 
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CiJ-<3iJ  =  Cij.qij/sisj- 


V.  SELECTING  THE  KET  VARIABLE 


In  forward  stepwise  regression  the  variable  which  is  entered 
into  regression  is  the  one  which  yields  the  greatest  reduction  in  the 
variance  of  residuals  at  that  stage.  For  an  arbitrary  variable  X^ 
that  is  not  in  regression  it  is  seen  from  the  recurrence  formula  9 
that  the  variance  reduction  is  given  by  the  quantity 


^i  ~  ain  ani^aii  sin.q  sni.q^sii.c 


(4l) 


For  an  arbitrary  variable  that  is  in  regression  the  variance 
increase  resulting  from  the  removal  of  Xi  from  regression  is  given  by 
18. 


a.  a  j/a.i  =  - 

in  ni'  ii 


d  .  b  .  /c. .  . . 

ni.q  ni.q'  ii.qi 


(42) 


For  X^  not  in  regression  is  positive  and  for  X^  in 
regression  is  negative. 


After  determining  the  key  element  it  is  necessary  to  test 
whether  the  variance  reduction  due  to  entering  the  key  variable  is 
statistically  significant.  By  inspection  of  9  it  is  seen  that  for 
i  =  j  =  n 


(43) 
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The  quantity  (s  ,  s,  /r  s, , 

J  '  nk.q  kn.q'  nn.q  kk.c 


is  defined  as  the  product 


moment  coefficient  of  correlation  between  X  and  X.  .  This 

n.q  K.q 

quantity  is  denoted  as  r^  and  is  often  referred  to  as  a  partial 
correlation  coefficient.  Equation  (45)  can  be  written  in  the  form 

r  ,  =  s  ,  s,  /s  s, .  =  (s  -  s  ,  )/s  •  (44) 

nk.q  nk.q  kn.q;  nn.q  kk.q  '  nn.q  nn.qk"  nn.q 

2 

By  inspection  r^  gives  the  fractional  variance  reduction  obtained  by 

adding  X^  into  the  regression.  If  r^  is  statistically  different 

from  zero,  then  we  observe  that  the  fractional  variance  reduction  due 

to  X^  is  significant  and  that  X^  should  be  brought  into  regression. 

2 

For  forward  recursion  r  ,  can  be  computed  directly  from  the  first 

hk  •  cj 

expression  of  (44).  For  backwards  recursion,  i.e. ,  to  test  whether  a 

2 

variable  can  be  removed  from  regression,  r^  g  can  be  computed  from 
the  formula 


4.q=V(W+V-  (45) 

A  test  of  significance  for  r^  is  listed  by  Graybill  [7  ]•  If  the 

true  coefficient  r  ,  ,  for  which  r  .  is  an  estimate,  is  zero  the 

nk.q'  nk.q 

quantity 

*  •  r,*.,<f-2)1/2/(1  ■  4.//2  w 


is  distributed  as  the  Student  t  distribution.  A  test  of  the  hypothesis 

r  ,  f  0  against  the  alternative  r  ,  =  0  is  performed  as  follows: 

nk.q  nk.q 

The  quantity  t  is  compared  against  the  one-tailed  t  statistic,  t(f-2,c) 
appropriate  to  the  degrees  of  freedom,  f,  and  the  confidence  level,  c. 
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The  hypothesis  is  accepted  if  t  >  t(f-2,c). 

The  test  is  used  in  two  ways: 

(A)  At  the  beginning  of  a  stage  is  computed  for  all 
subscripts,  i  =1,  2,  ...  n-1.  The  largest  positive  identifies  the 
key  variable  which  should  be  tested  for  entering  into  the  regression. 
The  quantity  r  ,  is  computed  using  Equation  (44)  and  the  t  test 

UK 

described  above  is  performed.  If  t  >  t(f-2,c)  the  variable  is 
entered  into  regression  by  performing  the  transformation  T^. 

(B)  The  second  part  of  the  stage  begins  by  again  computing  V.^ 

for  all  i.  The  negative  Vi  identify  the  variables  that  are  not  in 

regression.  The  negative  of  smallest  magnitude  identifies  the  key 

variable  to  test  for  removal,  r  .  is  computed  using  Equation  (45). 

iuc*q 

If  t  >  t(f-2,c)  the  correlation  is  significant  and  the  variable 
should  remain  in  regression.  If  t  <  t(f-2,c)  the  variable  can  be 
removed  from  regression  without  significantly  increasing  the  variance 
of  residuals .  X^  is  removed  from  the  regression  by  applying  T^.  The 
procedure  is  repeated  until  all  insignificant  variables  have  been 
removed. 


The  modification  of  (A)  and  (B)  above  for  backward  regression 
is  quite  simple.  Initially  the  recursion  is  controlled  to  proceed  all 
the  way  forward,  yielding  the  inverse  of  the  covariance  matrix.  On  the 
way  back,  after  any  variable  is  removed,  the  determination  is  made  as 
to  whether  a  variable  removed  previously  has  become  significant,  if  so 


it  is  reentered.  If  not,  then  the  least  significant  variable  in 
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regression  is  removed,  provided  again  that  the  resulting  variance 
increase  is  not  significant.  As  in  the  forward  version,  the  procedure 
continues  until  the  equilibrium  point  is  reached. 


VI.  IMPROVEMENT  OF  THE  ALGORITHM 


2 

The  algorithm  described  by  Efroymson  requires  n  words  of 
storage  for  the  covariance  matrix  and  the  successive  matrices  that  are 
generated  as  the  regression  proceeds.  For  problems  requiring  only  a 
few  variables  in  the  candidate  model,  this  storage  requirement  creates 
no  difficulty  on  modern  computing  machinery.  The  author  has  been 
involved  in  problems  (see  for  example  URL  Report  No.  13W,  [2])  where 
it  was  necessary  to  examine  candidate  models  containing  96  variables. 
Fortunately  the  machine  used  on  this  problem,  the  Ballistic  Research 
Laboratories  BRIESC  has  over  30,000  words  of  built-in  double  precision 
storage,  i.e.,  the  standard  word  length  in  this  computer  is  68  binary 
bits  or  approximately  20  decimal  digits.  Most  commercial  machines  have 
word  lengths  of  only  8  or  10  decimal  digits.  The  experience  of  various 
computing  facilities  on  large  scale  matrix  problems  done  on  commercial 
machines  is  that  double  precision  computations  are  required  to  avoid 
the  computational  problem  associated  with  roundoff.  The  details  of 
this  roundoff  phenomena  associated  with  polynomial  models  is  discussed 
by  Ralston  [l5],  page  233. 

The  necessity  of  doing  a  stepwise  multiple  regression  program 
in  double  precision  reduces  the  available  storage  by  a  factor  of  two 
and  accordingly  limits  the  size  of  the  model  which  can  be  analyzed  by 
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a  factor  of  the  square  root  of  two.  The  modified  algorithm  derived 
below  has  been  implemented  in  the  BRIESC  program  described  in  [3]  and 
requires  only  (n  +  7n  -  2)/2  words  of  storage.  In  addition  the 
computations  related  to  the  application  of  the  recursion  formulas  is 
halved  thus  requiring  less  computer  time. 

In  problems  involving  symmetric  matrices  it  is  common  to  take 
advantage  of  the  symmetry  to  reduce  computations  and  storage.  This  is 
especially  true  of  least-squares  computations  since  the  covariance 
matrix  is  symmetric.  The  matrices  involved  in  stepwise  multiple 
regression  are  not  symmetric,  but  might  be  termed  pseudo  symmetric, 
i.e.,  ja^J  =  |aji| ,  the  elements  are  symmetric  in  absolute  value. 

Except  for  signs,  all  the  statistical  information  stored  in  the  matrix 
is  contained  in  the  upper  triangular  part  of  the  matrix  and  the 
diagonal.  The  justification  for  storing  the  lower  triangular  matrix 
(and  subsequently  operating  on  it)  seemingly  is  that  the  signs  contained 
in  the  lower  triangular  matrix  are  used  to  indicate  which  variables 
are  in  regression  and  which  are  not.  To  keep  track  of  which  variables 
are  in  regression  one  can  store  a  sequence  of  numbers  z^,  z g,  • • .  zQ. 

The  presence  of  a  variable  in  regression  is  denoted  by  the  presence 
of  -  1  in  z^  Initially  z^,  Zg,  . ..  zn  are  all  +  1  to  denote  no 
variables  in  regression.  As  a  variable  is  entered  into  regression 
or  removed  z^  is  multiplied  by  -  1.  If  z^  is  operated  on  an  even 
number  of  times  this  means  that  X^  was  removed  from  regression  as  often 
as  it  was  entered  and  hence  is  not  in.  This  would  be  so  indicated  by 
z^  since  z^  would  be  equal  to  (-1)“  «  +  1.  Alternatively  if  z^  is 
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2r+l 

operated  on  an  odd  number  of  times  z ^  is  equal  to  (-1)  =  -  1. 

This  indicates  is  in  regression. 


One  additional  problem  remains.  The  transformation  of 
elements  in  the  upper  triangular  matrix  using  T^  involves  elements 
which  by  storage  implications  are  in  the  lower  triangular  matrix.  Since 
it  is  desired  to  modify  the  algorithm  so  that  the  lower  triangular 
matrix  will  not  be  stored,  some  method  is  needed  to  determine  the  signs 
of  the  elements  below  the  diagonal.  The  elements  *  c  and 
8iJ  *  8,Ji*  **  aiJ  is  a  reSreBsion  coefficient  a^  =  =  -d^. 

Hence  we  note  that  a^  =  -  a^  if  either  Xi  or  are  in  regression, 
but  a . .  *  a.,  if  both  are  in  regression  or  if  neither  are  in  regression. 
By  inspection  of  it  is  seen  that  the  only  elements  involved  in 
transforming  are  a^  itself  and  other  elements  which  lie  either  in 
row  k  or  column  k.  This  leads  one  to  look  for  a  way  of  "filling  in" 
row  k  and  column  k  below  the  diagonal  with  proper  signs  at  the  beginning 


of  the  stage.  This  is  most  conveniently  done  by  storing  the  row  and 
column  in  separate  storage  as  elements  t^.  If  is  on  or  above  the 
diagonal  then  t^  =  a^.  Hence  two  rules  are  immediately  apparent . 


t^j  ~  a^j  J  =  k,  k+1,  ...  n  Upper  triangle  row  k 

tik  *  aik  i  =  1,  2,  ...  k-1  Upper  triangle  column  k 

Hy  inspection  it  is  seen  that  t^  is  obtained  in  magnitude  by  a^  and 
in  sign  by  z ^Zy  This  leads  to  the  additional  two  rules 
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n. 


Lower  triangle  row  k 
Lower  triangle  column  k 


fckj  zkzj  ajk  <5  ~  2,  ...  k-1, 

4ik  =  zizk  ®ki  1  =  k+1>  k+2>  ••• 

Equations  (38)  are  then  used  to  generate  the  new  upper  triangular 
matrix.  The  complete  algorithm  is  as  follows: 


•o 

cS* 

II 

J  —  k,  k+1,  ...  n 

tik  =  aik 

i  =  1,  2,  ...  k-1 

*kj  =  Vj  ajk 

3  =  1,  2,  ...  k-1 

tik  =  zizk  ®ki 

i  =  k+1,  k+2,  . . . 

aij  =  au  -  "ik  *kAk 

^  =  2,  ...  k-1, 

J  ^  j 

II 

3  =  k+1,  k+2,  . . .  i 

aik  =  *  tik/tkk 

i  »  1,  2,  . . .  k-1 

®kk  =  1/tkk 

Zk  "  -  zk 

i  =  3  =  k 

The  primes  denote  the  elements  of  the  new  matrix. 


VII.  A  COMPARISON  OF  FORWARD  AND  BACKWARD  STEPWISE  REGRESSION 


Hamaker  [lo],  1962,  compared,  forward  and  backward  stepwise 
regression  on  data  taken  from  Hald  [9].  This  data  concerned  the  heat 
evolved  during  the  hardening  of  cement.  The  problem  involved  four 
Independent  variables  X^,  Xg,  and  X^.  The  optimum  model  in  this 
problem  contains  the  variables  X^  and  Xg.  In  Hamaker' s  version  of 
"forward  selection”  the  variables  were  entered  into  the  regression  in 
the  order  X^,  X^,  Xg,  X^  and  in  his  "backward  elimination"  the 
variables  are  eliminated  in  the  order  Xy  Xy  X^,  Xg.  He  concludes 
that  if  a  model  containing  two  variables  were  selected  the  forward 
version  would  yield  the  model  containing  X^  and  X^  while  the  backward 
version  would  yield  the  optimum  model  containing  the  variables  X1  and 
Xg.  Hamaker  made  no  provision  for  removing  variables  as  they  became 
insignificant  and  in  fact,  a  forward  procedure  which  does  provide  this 
capability  would  in  this  example  have  arrived  at  the  optimum  model. 

The  author  analysed  Hald's  data  using  the  computer  program  described 
in  [5]  and  obtained  the  results  listed  on  the  next  page. 
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STAGE 

ACTION  TAKEN 

VARIABI£S  IN 
REGRESSION  AT 
END  OF  STAGE 

STD.  DEV. 
RESIDUALS 

0 

- 

- 

15.  OU 

1 

Add 

Xt 

8.96 

2 

Add  X1 

VX1 

2.73 

3 

Add  X2 

V  *1'  X2 

2.31 

k 

Remove 

Xl>  X2 

2.k\ 

The  decision  to  add  or  remove  variables  were  made  at  the  95^  level 
of  significance.  It  is  quite  possible  that  at  other  levels  of 
significance  different  results  might  be  obtained  and  in  fact  in 
Section  IX.  an  example  is  listed  shoving  that  even  for  a  "perfect  fit" 
model  the  forward  version  does  not  obtain  the  optimum  model  whereas  the 
backward  version  does. 


Abt*  et  al  [l]  discuss  the  forward  and  backward  versions  and 

attribute  the  occurrence  of  different  results  to  the  presence  of 

"compounds".  They  define  a  compound  as 

a  set  of  N  £  N  independent  variables  plus  the  dependent 
variable  when  the  error  variance  associated  with  all  N 
independent  variables  is  smaller,  by  orders  of  magnitude, 
than  the  error  variance  associated  with  any  subset  of 
N«1  independent  variables. 

Their  discussion,  however,  seems  to  be  based  on  a  stepwise  procedure 

which  does  not  allow  for  the  removal  of  terms  in  the  forward  version, 

*  „ 

Also  discussed  in  a  paper  titled  On  the  Identification  of  the 

Significant  Independent  Variables  in  Linear  Hsdels”  by  Klaus  Abt, 

soon  to  be  published  in  Ketrika.  Dr.  Abt  provided  the  author  a 

preprint  of  this  paper. 
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nor  for  the  subsequent  addition  of  variables  that  have  been  eliminated 
in  tig  backward  version.  The  end  result  of  a  regression  run  on  Abt 
et  al’s  program  as  in  Haaaker's  example  is  an  ordering  of  the  variables 
in  either  a  forward  or  tickward  ranking.  The  ranking  in  the  end  has 
really  no  meaning  in  regards  to  the  relative  importance  of  the 
variables'  contributions  to  the  variance  reduction.  The  author,  for 
example,  has  observed  the  following  phenomenon:  In  six  stages  of  a 
forward  run,  five  stages  consisted  of  removing  variables  that  had 
entered  earlier.  In  this  problem,  variables  that  in  the  end  were 
insignificant  would  have  been  highly  ranked  had  they  not  been  tested  for 
removal. 

The  objective  in  multiple  linear  regression  analysis  is  the 
obtaining  of  a  "prediction  model"  as  near  optimum  as  is  practical,  and 
the  ordering  as  discussed  above  is  of  interest  only  in  relation  to  the 
information  it  provides  in  achieving  this  end.  In  this  context  a 
provision  for  removing  terms  in  the  forward  version  seems  to  be  more 
effective  toward  achieving  this  goal  than  a  forward  procedure  which 
merely  orders  the  variables  in  the  sequence  which  produces  the 
greatest  reduction  in  the  sum  of  squares  of  residuals.  Similarly,  the 
backward  version  should  seemingly  include  a  provision  for  reentering 
variables  if  they  subsequently  become  significant  after  their  removal. 

The  cost  of  running  regression  problems  on  todays  modern 
machinery  is  so  small  that  it  seems  for  many  problems  one  might 
fruitfully  apply  both  versions  for  comparison.  When  many  observations 
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are  Involved  In  relation  to  the  number  of  variables  the  formation  of 
the  covariance  matrix  seems  to  comprise  the  bulk  of  the  computation 
time.  On  a  problem  involving  96  variables  and  1439  observations  the 
BRI£SC  program  [33  ran  5-34  minutes  in  the  forward  version,  entering 
21  variables  before  reaching  equilibrium.  When  the  program  was 
modified  to  take  advantage  of  tte  modified  algorithm  derived  earlier 
this  same  problem  ran  in  4.90  minutes.  From  these  figures  it  is 
estimated  that  the  formation  of  the  covariance  matrix  required  about 
4.5  minutes  and  that  a  complete  forward  regression  would  take 
approximately  2.0  minutes  with  a  similar  estimate  for  the  time  required 
to  do  a  backward  regression.  Most  problems  are  of  a  much  smaller  scale 
and  running  time  considerations  are  usually  unimportant. 
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1 370 

475 

150 

1455' 

=  1/25 

'  475 

1700 

-4oo 

-1000 

150 

-400 

1250 

4750 

[1455 

-1000 

4750 

21070 , 

At  the  first  stage  the  test  quantities  for  the  reduction  in  the  sum 
of  squares  of  residuals  is  given  by 

V1  =  *11*  al*l/all  =  1/25  (l455)2/370  =  28.9, 

V2  =  a2l*  &krj&22  =  1/25  (1°°0)2A700  =  25.5, 

V5  =  a^  a^/a^  =  l/25  (4750)2/l250  =  722.0. 

Since  Y^  is  the  largest  of  the  three  test  quantities,  becomes  the 
key  variable.  Tc  test  whether  this  variable  will  significantly  reduce 
the  sum  cf  squares  of  residuals  we  obtain  the  coefficient  r^. 


rki  “  al*3  *34/^33  *44  »  ( 4750 )2/( 1250)  (21070)  =  .857 


^45{f”2) 
*  r43 


y* 


=  4,24 


fc(f-2,,95>  =  t(3,.95)  -  2.35 

Since  t  >  t(f-2, .95)  the  test  for  adding  the  variable  indicates  that 
(at  the  9%  level  of  confidence)  should  be  brought  into  the 
regression.  After  operating  on  Aq  with  the  Gauss -Jordan  algorithm  with 
a~,  as  the  pivot  we  obtain 
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552 

523 

-3 

885 

523 

1572 

8 

520 

3 

-8 

1/2 

95 

885 

520 

-95 

3020 

The  test  quantities  are 


=  1/25(885)7542  =  91*7, 

V2  =  1/25(520 )2/l572  *  68.7. 

The  key  variable  by  inspection  is  X, . 

r4l.5  =  (Q85)/(542)(5020)  =  .758 

t  .  gga  1/2 . 2.io 

t(f-2,.95)  *  t(2, .95)  =  2.92 

Since  t  <  t(f-2,.95)  the  test  for  addition  fails  and  the  variable  X1 
is  not  entered  into  regression.  This  then  is  the  equilibrium  point 
and  the  model  which  a  forward  stepwise  procedure  would  yield  is 

\  \  *  *5  (Xj  -  *3), 

b3  -  *43  =  95/25  =  .58, 

bo  =  Xk  -  b5  ^  =  39/5  -  (2)95/25  =  .2, 

\  =  .2  +  .38  Xj. 

Note  that  in  this  example  no  tests  for  removal  were  necessary. 

It  is  not  necessary  to  do  the  complete  computations  to  exhibit 
the  result  for  the  backward  version.  One  of  the  three  variables. 
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(assume  Xg)  will  be  the  key  variable  to  test  for  removal.  The 
partial  correlation  coefficient  is  computed  from  Equation  (45). 

r42.15  =  V2^s44.125  +  V 

Since  Sj^  ^  =  0,  the  coefficient  is  1.0  indicating  perfect 
correlation.  This  would  be  true  for  any  of  the  three  variables. 
Obviously,  no  variable  is  removed  and  the  equilibrium  point  is 
established  with  all  three  variables  in  regression. 

Recent  Work  in  Europe 

After  the  completion  of  this  manuscript  the  author  attended  a 
seminar  titled,  "A  New  Computer  Approach  in  Determining  Optimum 
Regression  in  Multivariate  Analysis."  The  lecturer  was  Dr.  M.  G. 
Kendall,  the  noted  British  statistician.  The  new  approach  referred  to 
in  the  seminar  title  was  a  modification  of  the  technique  described  by 
Lotto  and  Garside  in  enumerating  the  2s -1  regressions.  Kendall  and 
his  coworkers  have  developed  an  algorithm  which  is  more  economical  than 
the  recursive  generation  of  the  211  -1  regressions  by  noting  that  it  is 
possible  to  identify  (without  performing  the  computations )  certain 
useless  combinations  which  are  demonstrably  worse  than  combinations  for 
which  regressions  have  already  been  obtained.  The  details  of  this 
algorithm  can  be  found  in  the  paper  "The  Discarding  of  Variables  in 
Multivariate  Analysis"  by  E.  M.  L.  Beale,  M.  G.  Kendall  and  D.  W.  Mann, 
copies  of  which  were  distributed  at  the  seminar*.  This  technique  has 

*This  seminar  was  »  ;ld  on  April  11,  1967  and  sponsored  by  C-E-I-R  Inc . , 
5272  River  Road,  Washington,  D.C. 
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been  called  "partial  enumeration"  and  its  attractiveness  in  comparison 
to  forward  and  backward  stepwise  regression  was  noted.  It  was  pointed 
out,  as  was  done  earlier  in  this  thesis,  that  stepwise  regression  does 
not  in  general  lead  to  the  optimum  model.  In  this  connection, 
reference  was  made  to  a  paper  by  Oosterhoff*  (1963)  which  contains  an 
example  for  which  the  forward  and  backward  methods  lead,  to  the  same 
model,  neither  of  which  is  optimum. 


Oosterhoff,  J.  (1963),  On  the  Selection  of  Independent  Variables  in  a 
Regression  Equation,  Report,  S  319  (VP23)  Matbematisch  Centrum, 
Amsterdam. 
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when  seeking  an  optimum  linear  ct-  abination  of  variables  from  the  subsets  of  the  N 
variables.  The  procedure  takes  advantage  of  recurrence  relations  existing 
between  covariances  of  residuals,  regression  coefficients,  and  inverse  elements  of 
partitions  of  the  covariance  matrix.  The  application  of  these  recurrence  formulas 
is  equivalent  to  the  introduction  or  deletion  of  a  variable  into  a  linear 
approximating  function  which  is  being  sought  as  the  solution  to  a  data  analysis 
problem.  This  report  contains  derivations  of  the  recurrence  formulas,  shows  how 
they  are  implemented  in  a  computer  program  and  includes  an  improved  algorithm 
wnich  halves  the  storage  requirements  of  previous  algorithms.  A  computer  program 
for  the  BRIESC  computer  which  incorporates  this  procedure  is  described  by  the 
author  and  others  in  a  previous  report,  HRL  Report  No.  1330,  July  1966.  The 
present  report  is  an  amplification  of  the  statistical  theory  and  computational 
procedures  presented  in  that  report  in  addition  to  the  exposition  of  the  improved 
algorithm. 
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